Proper name knowledge acquisition for text understanding
نویسنده
چکیده
Current work in proper name analysis is focused on identification and limited categorisation of names. Some research has been carried out in acquiring knowledge of proper names from the contextual information within texts. In this study, we investigate how to transform human-oriented compilations, which contain a rich knowledge of proper names, into formallyrepresented knowledge for computer consumption. If a dictionary is considered the source of knowledge about common nouns, an encyclopaedia should be regarded as the source of knowledge about proper nouns. Considering the amount of work on knowledge acquisition from dictionaries, comparatively little work has been carried out on extracting knowledge about proper nouns from encyclopaediae. Here we discuss the knowledge with which proper names are related and present our methods for knowledge acquisition from dictionaries of biography. Our analysis of biographical entries leads to the observation that there are indeed repetitive patterns used in biographies, in both English and Chinese. These patterns provide the possibility of transforming the knowledge of proper names from text descriptions to encoded knowledge. Taking a sublanguage approach, we report how the content of a dictionary of biography can be mapped onto a knowledge base with a minimum amount of human intervention. ABKAS, Archetype of a Biographical Knowledge Acquisition System, has been implemented. It comprises the pre-processing unit, the sublanguage parser and the knowledge base constructor. The key component of ABKAS is its sublanguage parser which consists of a number of finite state machines. Based on the local grammar we have identified, the sublanguage parser achieves the syntax-semantic mapping. Biographical entries in both English and Chinese are parsed and their corresponding logical forms are generated and further represented in knowledge bases.
منابع مشابه
A Natural Language Understanding System for Knowledge-Based Analysis of Medical Texts
An approach to knowledge-based understanding of real-world texts from the medical domain (viz. gastro-intestinal findings) is presented. We survey major methodological features of an object-oriented, fully lexicalized, dependency-based grammar model which is tightly linked to domain knowledge representations based on description logics. The parser adheres to the principles of robustness, increm...
متن کاملStructural Domain Modeling for S Understanding Equipment Failure Message
i needed to dereference the names and descriptions of equipment referred to n the text, and to infer their causal relations and operational states when e s these are only implicitly expressed by the message writer. Knowing th tructural configuration of the equipment is useful in both tasks, and a s a structural domain model can be extracted readily from equipment manual nd their accompanying pa...
متن کاملIdentifying Unknown Proper Names In Newswire Text
The identification of unknown proper names in text is a significant challenge for NLP systems operating on unrestricted text. A system which indexes documents according to name references can be useful for information retrieval or as a preprocessor for more knowledge intensive tasks such as database extraction. This paper describes a system which uses text skimming techniques for deriving prope...
متن کاملAcquisition Of Lexical Paraphrases From Texts
Automatic acquisition of paraphrase knowledge for content words is proposed. Using only a non-parallel text corpus, we compute the paraphrasability metrics between two words from their similarity in context. We then filter words such as proper nouns from external knowledge. Finally, we use a heuristic in further filtering to improve the accuracy of the automatic acquisition. In this paper, we r...
متن کاملAutomatic knowledge acquisition from medical texts.
An approach to knowledge-based understanding of realistic texts from the medical domain (viz. findings of gastro-intestinal diseases) is presented. We survey major methodological features of an object-oriented, fully lexicalized, dependency-based grammar model which is tightly linked to domain knowledge representations based on description logics. The parser adheres to the principles of robustn...
متن کامل